MRI中胎儿结构的体积测量很耗时,并且容易发生错误,因此需要自动分割。由于胎盘模糊边界和胎儿脑皮层复杂的褶皱,胎盘分割和准确的胎儿脑分割进行回旋评估特别具有挑战性。在本文中,我们研究了对问题的轮廓骰子损失的使用,并将其与其他边界损失以及联合骰子和横向内向损失进行比较。通过侵蚀,扩张和XOR操作员有效地计算出每个切片的损失。我们描述了类似于轮廓骰子指标的损失的新公式。骰子损失和轮廓骰子的组合为胎盘分割提供了最佳性能。对于胎儿脑部分割,最佳性能的损失是结合骰子丢失,随后是骰子和轮廓骰子损失的骰子,其性能比其他边界损失更好。
translated by 谷歌翻译
深度学习方法已被证明可以有效地分割医学成像中的结构和病理。但是,它们需要大量注释的数据集,其手动分割是一项繁琐且耗时的任务,尤其是对于大型结构。我们提出了一种新的部分注释方法,该方法使用每次扫描中的一小部分连续注释切片,其注释工作仅等于很少的注释情况。通过仅使用带注释的块进行部分注释的培训,将有关切片的信息包含在感兴趣的结构之外,并修改批处理损失函数以仅考虑带注释的切片。为了促进低数据制度中的培训,我们使用两步优化过程。我们用两个MRI序列Trufi和Fiesta用流行的软骰子损失测试了该方法,并将完整的注释状态与部分注释与类似的注释工作进行了比较。对于TRUFI数据,与完整注释相比,部分注释的使用平均表现稍好一些,骰子得分从0.936增加到0.942,并且骰子的标准偏差(STD)大幅下降22%,平均对称表面距离(ASSD)提高15%。对于嘉年华的序列,部分注释还会在分布数据中分别降低骰子分数和ASSD指标的STD和ASSD指标分别降低27.5%和33%骰子得分从0.84到0.9,从7.46降低到4.01毫米。两步优化过程有助于部分注释分别分配和分布数据。因此,建议使用两步优化器的部分注释方法在低数据制度下改善分割性能。
translated by 谷歌翻译
正常的胎儿脂肪组织(AT)发育对于围产期健康至关重要。在或简单地脂肪以脂质形式存储能量。营养不良可能导致过度或耗尽的肥胖。尽管以前的研究表明,AT和围产期结局的量之间存在相关性,但缺乏定量方法,对AT的产前评估受到限制。使用磁共振成像(MRI),可以从两个点Dixon图像中获得整个胎儿的3D脂肪和纯水图像,以在脂质定量时启用。本文是第一个提出一种基于Dixon MRI的胎儿脂肪分割的深度学习方法的方法。它优化了放射科医生的手动胎儿脂肪描述时间,以生成带注释的培训数据集。它由两个步骤组成:1)基于模型的半自动胎儿脂肪分割,由放射科医生进行了审查和纠正; 2)使用在所得的注释数据集中训练的DL网络的自动胎儿脂肪分割。培训了三个DL网络。与手动分割相比,我们显示出分割时间(3:38小时至<1小时)和观察者变异性(0.738至0.906)的显着改善。用3D残差U-NET,NN-UNET和SWIN-UNETR TRONSERTER网络对24个测试用例进行自动分割,平均骰子得分分别为0.863、0.787和0.856。这些结果比手动观察者的变异性更好,并且与自动成人和小儿脂肪分割相当。一名放射科医生审查并纠正了六个新的独立案例,并使用最佳性能网络进行了细分,导致骰子得分为0.961,校正时间显着减少了15:20分钟。使用这些新颖的分割方法和短暂的MRI获取时间,可以在临床和大型果园研究中量化全身皮下脂质的单个胎儿。
translated by 谷歌翻译
Extracting complex structures from grid-based data is a common key step in automated medical image analysis. The conventional solution to recovering tree-structured geometries typically involves computing the minimal cost path through intermediate representations derived from segmentation masks. However, this methodology has significant limitations in the context of projective imaging of tree-structured 3D anatomical data such as coronary arteries, since there are often overlapping branches in the 2D projection. In this work, we propose a novel approach to predicting tree connectivity structure which reformulates the task as an optimization problem over individual steps of a recursive process. We design and train a two-stage model which leverages the UNet and Transformer architectures and introduces an image-based prompting technique. Our proposed method achieves compelling results on a pair of synthetic datasets, and outperforms a shortest-path baseline.
translated by 谷歌翻译
Curriculum learning and self-paced learning are the training strategies that gradually feed the samples from easy to more complex. They have captivated increasing attention due to their excellent performance in robotic vision. Most recent works focus on designing curricula based on difficulty levels in input samples or smoothing the feature maps. However, smoothing labels to control the learning utility in a curriculum manner is still unexplored. In this work, we design a paced curriculum by label smoothing (P-CBLS) using paced learning with uniform label smoothing (ULS) for classification tasks and fuse uniform and spatially varying label smoothing (SVLS) for semantic segmentation tasks in a curriculum manner. In ULS and SVLS, a bigger smoothing factor value enforces a heavy smoothing penalty in the true label and limits learning less information. Therefore, we design the curriculum by label smoothing (CBLS). We set a bigger smoothing value at the beginning of training and gradually decreased it to zero to control the model learning utility from lower to higher. We also designed a confidence-aware pacing function and combined it with our CBLS to investigate the benefits of various curricula. The proposed techniques are validated on four robotic surgery datasets of multi-class, multi-label classification, captioning, and segmentation tasks. We also investigate the robustness of our method by corrupting validation data into different severity levels. Our extensive analysis shows that the proposed method improves prediction accuracy and robustness.
translated by 谷歌翻译
Attribute-controlled text rewriting, also known as text style-transfer, has a crucial role in regulating attributes and biases of textual training data and a machine generated text. In this work we present SimpleStyle, a minimalist yet effective approach for style-transfer composed of two simple ingredients: controlled denoising and output filtering. Despite the simplicity of our approach, which can be succinctly described with a few lines of code, it is competitive with previous state-of-the-art methods both in automatic and in human evaluation. To demonstrate the adaptability and practical value of our system beyond academic data, we apply SimpleStyle to transfer a wide range of text attributes appearing in real-world textual data from social networks. Additionally, we introduce a novel "soft noising" technique that further improves the performance of our system. We also show that teaching a student model to generate the output of SimpleStyle can result in a system that performs style transfer of equivalent quality with only a single greedy-decoded sample. Finally, we suggest our method as a remedy for the fundamental incompatible baseline issue that holds progress in the field. We offer our protocol as a simple yet strong baseline for works that wish to make incremental advancements in the field of attribute controlled text rewriting.
translated by 谷歌翻译
Temporal reasoning is the task of predicting temporal relations of event pairs with corresponding contexts. While some temporal reasoning models perform reasonably well on in-domain benchmarks, we have little idea of the systems' generalizability due to existing datasets' limitations. In this work, we introduce a novel task named TODAY that bridges this gap with temporal differential analysis, which as the name suggests, evaluates if systems can correctly understand the effect of incremental changes. Specifically, TODAY makes slight context changes for given event pairs, and systems need to tell how this subtle contextual change will affect temporal relation distributions. To facilitate learning, TODAY also annotates human explanations. We show that existing models, including GPT-3, drop to random guessing on TODAY, suggesting that they heavily rely on spurious information rather than proper reasoning for temporal predictions. On the other hand, we show that TODAY's supervision style and explanation annotations can be used in joint learning and encourage models to use more appropriate signals during training and outperform across several benchmarks. TODAY can also be used to train models to solicit incidental supervision from noisy sources such as GPT-3 and moves farther towards generic temporal reasoning systems.
translated by 谷歌翻译
State-of-the-art 3D semantic segmentation models are trained on the off-the-shelf public benchmarks, but they often face the major challenge when these well-trained models are deployed to a new domain. In this paper, we propose an Active-and-Adaptive Segmentation (ADAS) baseline to enhance the weak cross-domain generalization ability of a well-trained 3D segmentation model, and bridge the point distribution gap between domains. Specifically, before the cross-domain adaptation stage begins, ADAS performs an active sampling operation to select a maximally-informative subset from both source and target domains for effective adaptation, reducing the adaptation difficulty under 3D scenarios. Benefiting from the rise of multi-modal 2D-3D datasets, ADAS utilizes a cross-modal attention-based feature fusion module that can extract a representative pair of image features and point features to achieve a bi-directional image-point feature interaction for better safe adaptation. Experimentally, ADAS is verified to be effective in many cross-domain settings including: 1) Unsupervised Domain Adaptation (UDA), which means that all samples from target domain are unlabeled; 2) Unsupervised Few-shot Domain Adaptation (UFDA) which means that only a few unlabeled samples are available in the unlabeled target domain; 3) Active Domain Adaptation (ADA) which means that the selected target samples by ADAS are manually annotated. Their results demonstrate that ADAS achieves a significant accuracy gain by easily coupling ADAS with self-training methods or off-the-shelf UDA works.
translated by 谷歌翻译
Dual encoders are now the dominant architecture for dense retrieval. Yet, we have little understanding of how they represent text, and why this leads to good performance. In this work, we shed light on this question via distributions over the vocabulary. We propose to interpret the vector representations produced by dual encoders by projecting them into the model's vocabulary space. We show that the resulting distributions over vocabulary tokens are intuitive and contain rich semantic information. We find that this view can explain some of the failure cases of dense retrievers. For example, the inability of models to handle tail entities can be explained via a tendency of the token distributions to forget some of the tokens of those entities. We leverage this insight and propose a simple way to enrich query and passage representations with lexical information at inference time, and show that this significantly improves performance compared to the original model in out-of-domain settings.
translated by 谷歌翻译
As language models (LMs) scale, they develop many novel behaviors, good and bad, exacerbating the need to evaluate how they behave. Prior work creates evaluations with crowdwork (which is time-consuming and expensive) or existing data sources (which are not always available). Here, we automatically generate evaluations with LMs. We explore approaches with varying amounts of human effort, from instructing LMs to write yes/no questions to making complex Winogender schemas with multiple stages of LM-based generation and filtering. Crowdworkers rate the examples as highly relevant and agree with 90-100% of labels, sometimes more so than corresponding human-written datasets. We generate 154 datasets and discover new cases of inverse scaling where LMs get worse with size. Larger LMs repeat back a dialog user's preferred answer ("sycophancy") and express greater desire to pursue concerning goals like resource acquisition and goal preservation. We also find some of the first examples of inverse scaling in RL from Human Feedback (RLHF), where more RLHF makes LMs worse. For example, RLHF makes LMs express stronger political views (on gun rights and immigration) and a greater desire to avoid shut down. Overall, LM-written evaluations are high-quality and let us quickly discover many novel LM behaviors.
translated by 谷歌翻译